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How to mine data on the Web. (includes related article on data collection 

techniques) (Drilling for Data) ( Interne t/Web/Online Service 

Information) (Cover Story) 

Mena, Jesus 

Databased Web Advisor, vl5, n7, p32(5) 
July, 1997 

DOCUMENT TYPE: Cover Story ISSN: 1090-6436 LANGUAGE: English 

RECORD TYPE: Fulltext; Abstract 

WORD COUNT: 2450 LINE COUNT: 00223 

. . . calibrate their value and loyalty, enabling you to subsequently 

formulate unique ads and marketing strategies. Data mining is based in 
part on statistics and a field of artificial intelligence designed to 
emulate human perception known as machine - learning . Unlike database 
query programs, report generators, or statistical packages, data mining 
tools perform analysis automatically and formulate solutions in plain 
English as conjunctions or rules. The tools not only find patterns in 
databases automatically, they deliver solutions in... 
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DATA MINING APPLICATION WITH IMPROVED DATA MINING ALGORITHM SELECTION 
APPLICATION D» EXPLORATION EN PROFONDEUR DE DONNEES POUVANT AMELIORER LE 
CHOIX D'UN ALGORITHME D ' EXPLORATION EN PROFONDEUR DE DONNEES 
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Main International Patent Class: G06F-015/18 
Fulltext Availability: 
Detailed Description 

Detailed Description 
RULE 26) 

database servers through ODBC (Object Database Connectivity) using a pool 
of persistent database connections . 

[00871 The data mining software application described herein will 
operate in a general purpose computer, A computer is generally... 
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Data mining system and method called MintoMine for extracting data, 

from any data source, includes logical connection to extensible parsing 

environment that supports customized reverse-polish plug-in operators 
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Abstract (Basic) : WO 200365179 A2 

NOVELTY - A system for the extraction of data from a variety of 
sources into a single unifying ontology, comprising: an ontology based 
environment, where the environment includes an ontology description 
language (ODL) and a run-time accessible types system; logically 
connected to , an extensible parsing environment, where the parsing 
environment supports customized reverse-polish plug-in operators; 
logically connected to, a configurable outer parser capable of 
accepting a BNF (or equivalent) specification describing the source 
data format; an embedded inner parser capable of executing statements 
and performing actions directly on the objects and types described by 
the system ontology. 

DETAILED DESCRIPTION - AN INDEPENDENT CLAIM is also included for a 
method for extracting data from a variety of sources into a single 
unifying ontology. 

The unifying ontology is implemented by the Ontology Patent that 
introduced an ontology based language that is an extension of the C 
language . 

USE - Data mining system and method called MintoMine for 
extracting data, collection referencing and cross referencing all 
extracted records from any data source including bulk extraction of 
free-form data from sources, such as CD-ROMs, and the Internet.. 

ADVANTAGE - Provides rapid data mining where the data mining 
system designer is free to evolve an appropriate ontology (global 
model) as dictated by actual use and by the needs of the system users. 
Changes are automatically and instantaneously reflected throughout the 
system allowing rapid evolution of the system. The system enables the 
software environment to be rapidly changed and extended, predominantly 
without the need for code modification, according to requirements, and 
without the fear of introducing new coding errors and bugs in the 
process. Moreover this system can, through ontology, unify data from a 
wide variety of different and incompatible sources and databases into a 
single whole where the data is unified and searchable without 
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Correlation rule generation method for data mining system, involves 
detecting overlapping state and reliability level of tables, so as to 
designate tables with desired search ranges and weigh tages 
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Abstract (Basic) : JP 2001243072 A 

NOVELTY - A list of tables representing correlation rule with 
weightages is produced. The partial convergence between left and right 
side components of rule is judged. Based on the judgment, number of 
tables satisfying component overlapping is detected. The reliability 
level of each table is determined, to designate the tables with desired 
search range and weightages. 

DETAILED DESCRIPTION - An INDEPENDENT CLAIM is also included for 
correlation rule generating system. 

USE - For generating correlation rules utilized in data mining 
system. 

ADVANTAGE - Enables retrieving the desired information easily, by 
adding weightages to each data item. 

DESCRIPTION OF DRAWING (S) - The figure shows the flowchart 
representing data mining process. (Drawing includes non-English 
language text ) . 

pp; 21 DwgNo 1/12 
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Relational artificial intelligence system - includes knowledge 
acquisition unit, which discovers knowledge from spreadsheet-formed 
databases to generate bases using inductive learning, while reasoning 
unit reasons about generated bases to predict effect for future data 
readings 

Patent Assignee: CHANG H H (CHAN-I) 
Inventor: CHANG H H 

Number of Countries: 001 Number of Patents: 001 
Patent Family: 

Patent No Kind Date Applicat No Kind Date Week 
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Abstract (Basic) : US 5473732 A 

The system performs automatic knowledge acquisition from a set of 
data records, generates a set of relational knowledge bases and 
performs inferences on the set of relational knowledge bases to obtain 
inference results based on a set of required data. The comprises an 
input/output port for acquiring data and generating output. A computer 
with memory stores data and software programs. A set of relational 
inductive engines representing a set of executable programs is stored 
in the memory of the computer. The program automatically discovers 
knowledge from the set of data records and generates the set of 
relational knowledge bases. Each relational knowledge bases comprises a 
set of knowledge relations. 

A set of relational inference engines is also stored in the 
computer memory for reasoning about the set of relational knowledge 
bases and for obtaining stud inference results. The inference results 
are determined based on the set of required data records, which store 
all permissible values in fields of each attribute of the decision 
relations in the memory. A code is assigned to each permissible value. 
The permission value is then translated to code. A set of code decision 
relations is created, and then the code is translated to the 
permissible values. 

ADVANTAGE - Every component in system is relational. Data are 
organised in spreadsheet forms, thus system operates with high 
efficiency and speed. 

Dwg. 1/6 
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Automatic expert system containing automatic inference engine - has 
double loop program which processes inference independent of input 
knowledge base size and content 

Patent Assignee: CHANG H H (CHAN-I) 

Inventor: CHANG H H 

Number of Countries: 001 Number of Patents: 001 
Patent Family: 

Patent No Kind Date Applicat No Kind Date Week 
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Priority Applications (No Type Date) : US 92942976 A 19920910 
Patent Details: 

Patent No Kind Lan Pg Main IPC Filing Notes 
US 5263126 A 19 G06F-015/18 

Abstract (Basic) : US 5263126 A 

The expert system has new types of knowledge bases including a 
stored knowledge base (11) in the form of an array, an input knowledge 
base (12) in the form of a truth table or in some other user-defined 
forms. A built-in computer program^ or transfer engine, transfers the 
input knowledge base (3) to the stored knowledge base. A second, or 
inference engine (2) reasons with the stored knowledge base using a 
double loop. 

The double loop always processes the inference automatically 
independent of the input knowledge base size and content and without 
need for compilation is necessary. 

USE/ADVANTAGE - Knowledge based expert system. User friendly 
format. Fast operation. 
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Publication Date: 1993 Country of Publication: USA 684 pp. 
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U.S. Copyright Clearance Center Code: 0 7803 1257 0/93/$3.00 
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Conference Sponsor: IEEE 
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Language: English Document Type: Conference Paper (PA) 
Treatment: Applications (A); Practical (P) 

Abstract: An architecture for heterogeneous distributed knowledge 
management is presented. This architecture is based on a blackboard 
architecture and built around three different kinds of processes: 
procedural, exampler, and heuristic. Each process contributes a partial 
solution derived from the task assigned to it in the general flow diagram 
and interacts with other processes through a client-server computing model. 
The procedural processes are used for handling procedural preprocessing 
operations. The exampler processes are used in the form of connectionist 
models to recognize different patterns resulting from the 

preprocessing phase. The heuristic processes are implemented in the form of 
production rules to configure the chosen connectionist models based on 
some previously extracted features. (8 Refs) 
Subfile: B C 

Descriptors: blackboard architecture; client-server systems; distributed 
processing; knowledge based systems; neural nets 

Identifiers: feature extraction; pattern recognition; client-server 
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Definitions of data mining on the Web: 



An information extraction activity whose goal is to discover hidden facts contained in databases. 
Using a combination of machine learning, statistical analysis, modeling techniques and database 
technology, data mining finds patterns and subtle relationships in data and infers rules that allow the 
prediction of future results. Typical applications include market segmentation, customer profiling, 
fraud detection, evaluation of retail promotions, and credit risk analysis, 
www . twocro ws . com/glossary. htm 



Nontrivial extraction of implicit, previously unknown and potentially useful information from data, or 
the search for relationships and global patterns that exist in databases. [Bob Klevecz "The Whole 
EST Catalog" Scientist 12 (2): 22 Jan 18 1999] more... Algorithms & data analysis glossary 
www.genomi cglQssaries.com/content/chemoinformatics_gloss.asp 

As the term suggests, data mining is the analysis of data to establish relationships and identify 
patterns. 

p racti ce.findlaw .c om/glossary.html 



The process of analyzing large amounts of data in order to extract new kinds of useful information 
(such as implicit relationships between different pieces of information). 
www.rlg. org/re dlight green /gl ossary.html 

The process of using statistical techniques to discover subtle relationships between data items, and 
the construction of predictive models based on them. The process is not the same as just using an 
OLAP tool to find exceptional items. Generally, data mining is a very different and more specialist 
application than OLAP, and uses different tools from different vendors. Normally the users are 
different, too. OLAP vendors have had little success with their data mining efforts. 
www.olapreport.com/glossary.htm 



The analysis of database information; this usually involves identifying specific product information 
and codes, cleansing data and re-formatting it. 
www.isourceonline.com/research/glossary/index.asp 



The process of discovering previously unknown information from the data in data warehouses. 
ww w.upstreamcio.com/glossary.asp 

Data mining entails analyzing information for previously undiscovered correlations between two 
markets. Data mining connections can be made through associations (baseball fans also watch 
football), sequences (buying wood and then buying paint), forecasting (based on patterns found), 
and clustering (grouping information in a new way). 
www.ataconnect.org/htdocs/facts/glossary/dk.htm 

The process of analyzing data to identify patterns or relationships. 
ww w.iomega.com/support/documents/1 1 240.html 

Finding unexpected relationships in a data set. Similar to exploratory data analysis. Vitalnet is 
excellent at data mining. Some say data dredging, since if you look long enough, you will always 
find unusual events just by chance. 
www.ehdp.com/vitalnet/glossary.htm 
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a type of application witin built-in proprietary algorithms that sort, rank, and perform calculations on a 
specified and often large data set, producing visualizations that reveal patterns which may not have 
been evident from mere listings or summaries. View records related to this term 
www.sims.berkeley.edu/courses/is213/s99/Projects/P9/web_site /gl ossary.htm 

A technique using software tools geared for the user who typically does not know exactly what he's 
searching for, but is looking for particular patterns or trends. Data mining is the process of sifting 
through large amounts of data to produce data content relationships. This is also known as data 
surfing. 

www.etfinancial.com/dataglossary.htm 

The function of database applications that probe for hidden or undiscovered patterns in given 
collections of data. These applications use pattern recognition technologies as well as statistical and 
mathematical techniques and can have a key impact on the return on investment (ROI) for a 
technology expenditure upon discovering marketing or customer service data about one's clients. 
Data mining is not simple, and most companies have not yet actively mined their data, though nearly 
all have plans to do so in the future. 
www.eccs.uk.com/resources/glossary.asp 

The process of analyzing large volumes of data using pattern recognition or knowledge discovery 
techniques to identify meaningful trends and relationships represented in data in large databases. 
ww w.cio.gov.bc.ca/other/daf/IRM_Glossary.htm 



The practice of searching databases for hidden patterns of data which reveal additional information 
to create detailed profiles which may or may not be sold to third-parties. 
www.kgb.org/kgb/glossary.html 

Extraction of useful information from data sets. Data mining serves to find information that is hidden 
within the available data. 

www.pc ai.com/we b/g lossary/pcai_d_f_glossary.html 

refers to the many methods of data analysis(often using sophisticated algorithms) to answer open- 
endedquestions about your data. Data mining is easily used by non -technical people and provides 
information in real time. 

www.netplusmarketing.com/resources_glos.cfm 

Category of DBMS applications that seek to find new information and relationships within multiple, 
often heterogeneous, legacy data stores; for example, searching and analyzing customer sales 
transaction detail to determine buying habits by ZIP code or other demographic criteria. See Active 
Data Warehousing page. 
www.whamtech. co m/glossary.htm 

The comparison and study of large databases in order to discover new data relationships. Mining a 
clinical database may produce new insights on outcomes, alternate treatments or effects of 
treatment on different races and genders. 
pip.med.um ich.edu /glossary/index3.htm 

A technique to analyse data in very large databases. Analysis can reveal trends and patterns and 
can be used to improve vital business processes. 
www.knowledgepoint.com.au/starting_out/glossary.htm 

searching, accessing, extracting and manipulating data in databases. Exploration en profondeur de 
donnees 

www.nrcan.gc.ca/cfs-scf/science/prodserv/kmglossary_e.html 

A process of reviewing information in a database and making new connections among the 
information. 

www. vnulearning.com/kmwp/glossary.html 
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Analyzing information in a database using tools that look for trends or anomalies without knowledge 
of the data's meaning. Data mining is crucial in CRM strategies, particularly in e -commerce, 
www .persQnalization.org/GlossaryofTerms3.html 

A technique of sifting through vast amounts of data to discover trends in customer needs, buying 
patterns, profitability, and other critical business measurements. Usually requires the construction of 
a data warehouse. 

www.impact21 group.com/glossary.htm 

Applications that retrieve data over the grid and apply an algorithm; under development. 
www.ipg.nasa.gov/ipgflat/aboutipg/glossary.html 

data processing using sophisticated data search capabilities and statistical algorithms to discover 
patterns and correlations in large preexisting databases; a way to discover new meaning in data 
www.cogsci.princeton.edu/cgi-bin/webwn 
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